Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations
نویسنده
چکیده
In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.
منابع مشابه
Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German
The usefulness of a statistical approach suggested by Church and Hanks (1989) is evaluated for the extraction of verb-noun (V-N) collocations from German text corpora. Some motivations for the extraction of V-N collocations from corpora are given and a couple of differences concerning the German language are mentioned that have implications on the applicability of extraction methods developed f...
متن کاملCDB - A Database of Lexical Collocations
CDB is a relational database designed for the particular needs of representing lexical collocations. The relational model is defined such that competence-based descriptions of collocations (the competence base) and actually occurring collocation examples extracted from text corpora (the example base) complete each other. In the paper, the relational model is described and examples for the repre...
متن کاملExperiments on Candidate Data for Collocation Extraction
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).
متن کاملUsing chunked corpora for the acquisition of collocations and idiomatic expressions
This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with ...
متن کاملTowards a corpus-based dictionary of German noun-verb collocations
We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998